12/16/2019

Introduction

The purpose of the MA615 final project is to get a touch of Yelp datasets. Yelp is a restaurant and store rating App which is highly used in the United States. My research goal for this project is to explore and analyze the top 50 common stores in the city of Las Vegas in the Yelp Data. Exploratory data analysis, mapping, text mining, and sentiment analysis were conducted in this project.

Exploratory Data Analysis

The following bar plot shows the top 50 common stores in Las Vegas in the Yelp dataset. The most common store in Las Vegas is Starbucks with 145 stores, the followings are Subway (120 stores), McDonald’s (70 stores) and 7-Eleven (70 stores). I will do analysis both on all 50 stores and every one of the stores.

Mapping for Starbucks

The following map shows where 145 Starbucks are located in Las Vegas. We can tell that a lot of them are located in Las Vegas Boulevard which is the center of Las Vegas.

The map of all 50 stores and each one of the stores will be shown as the interactive mapping in Shiny Application.

Text Analysis on Customer Review

The following bar plot shows the top 50 common words customers use in their review to the stores in Las Vegas. Service, time, food, location, and customer, these words are the most used when people left a review for a store, which is also very reasonable in terms of my common sense.

Text Analysis on Customer Review Continued

The following Word Cloud shows another way to visualize the top 50 common words customers use in their reviews.

Text Analysis on Customer Review for Starbucks

The following Word Cloud shows the top 50 common words customers use in their review for Starbucks in Las Vegas. The Word Cloud of the top 50 common words customers use in their review for all 50 stores and each one of the stores will be shown as the interactive mapping in Shiny Application.

Text Analysis on Categories Variable

The Categories variable in the data is to categorize the store into different types. I did a text analysis on the Categories variable to see what types the top 50 common stores in Las Vegas are. The following bar plot shows the top 50 Category types in the Categories variable of stores in Las Vegas. It is not supersized to find out the top two are food and restaurants. Even though Yelp has expanded its service to more than just retreatants, but still, its main focus is restaurants.

Text Analysis on Categories variable continued

The following word cloud shows another way to visualize the top 50 Category types in the Categories variable of stores in Las Vegas.

Sentiment Analysis

I used Bing Liu and collaborators lexicons to do the Sentiment Analysis on customer reviews. The Bing lexicon categorizes words in a binary fashion into positive and negative categories. The following plot shows the positive and negative in the words of reviews to every 50 stores. The left side to 0 line is negative, the right side to 0 line is positive. It surprised me that for each store, the frequency of positive and negative words are similar.

Sentiment Analysis Continued

The following plot shows the net sentiment (positive-negative) in every 50 stores. We can tell that Capriotti’s Sandwich Shop, Starbucks and The Coffee Bean & Tea Leaf have the most positive review words, where McDonald’s, USPS, Pizza Hut have the most negative words in their reviews.

Sentiment Analysis on Starbucks

The following plot shows the top 10 common positive and negative words people use when they left a review in Starbucks. The same plot for each one of them will be shown as the interactive mapping in Shiny Application.

Comparison Cloud for Starbucks

The following comparison word cloud shows the comparison of negative words and positive words people use to review Starbucks.

Sentiment Ratio for Starbucks

The following table shows the ratio of positive words and negative words to the total words for Starbucks. The ratio of postive and negative are not far different. The same table for each one of them will be shown as the interactive mapping in Shiny Application.
Name Sentiment Frequency Ratio
Starbucks positive 17257 0.5492711
Starbucks negative 14161 0.4507289